Tree-Based Classifier Ensembles for PE Malware Analysis: A Performance Revisit

نویسندگان

چکیده

Given their escalating number and variety, combating malware is becoming increasingly strenuous. Machine learning techniques are often used in the literature to automatically discover models patterns behind such challenges create solutions that can maintain rapid pace at which evolves. This article compares various tree-based ensemble methods have been proposed analysis of PE malware. A an unconventional paradigm constructs combines a collection base learners (e.g., decision trees), as opposed conventional paradigm, aims construct individual from training data. Several techniques, random forest, XGBoost, CatBoost, GBM, LightGBM, taken into consideration appraised using different performance measures, accuracy, MCC, precision, recall, AUC, F1. In addition, experiment includes many public datasets, BODMAS, Kaggle, CIC-MalMem-2022, demonstrate generalizability classifiers variety contexts. Based on test findings, all ensembles performed well, differences between algorithms not statistically significant, particularly when respective hyperparameters appropriately configured. The also outperformed other, similar detectors published recent years.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Using Multi-Feature and Classifier Ensembles to Improve Malware Detection

With the rapid growth of internet application, malware has become one of the major threats to information security. Traditionally, anti-virus products use signature matching to detect malware, but the drawback is that they can not detect new and unknown malware. Recent studies showed that the use of machine learning can successfully detect new and unknown malware, but the limitation of this tec...

متن کامل

PE-Header-Based Malware Study and Detection

In this paper, I present a simple and faster apporach to distinguish between malware and legitimate .exe files by simply looking at properties of the MS Windows Portable Executable (PE) headers. We extract distinguishing features from the PEheaders using the structural information standardized by the Miscrosoft Windows operating system for executables. I use the following three methodology: (1)...

متن کامل

A High-Performance Model based on Ensembles for Twitter Sentiment Classification

Background and Objectives: Twitter Sentiment Classification is one of the most popular fields in information retrieval and text mining. Millions of people of the world intensity use social networks like Twitter. It supports users to publish tweets to tell what they are thinking about topics. There are numerous web sites built on the Internet presenting Twitter. The user can enter a sentiment ta...

متن کامل

Consensus-based combining method for classifier ensembles

In this paper, a new method for combining an ensemble of classifiers, called Consensus-based Combining Method (CCM) is proposed and evaluated. As in most other combination methods, the outputs of multiple classifiers are weighted and summed together into a single final classification decision. However, unlike the other methods, CCM adjusts the weights iteratively after comparing all of the clas...

متن کامل

Designing Classifier Ensembles with Constrained Performance Requirements

Classification requirements for real-world classification problems are often constrained by a given true positive or false positive rate to ensure that the classification error for the most important class is within a desired limit. For a sufficiently high true positive rate, this may result in the set-point being located somewhere in the flat portion of the ROC curve where the associated false...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

ژورنال

عنوان ژورنال: Algorithms

سال: 2022

ISSN: ['1999-4893']

DOI: https://doi.org/10.3390/a15090332